Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Web page blacklist discrimination method based on attention mechanism and ensemble learning
ZHOU Chaoran, ZHAO Jianping, MA Tai, ZHOU Xin
Journal of Computer Applications    2021, 41 (1): 133-138.   DOI: 10.11772/j.issn.1001-9081.2020081379
Abstract336)      PDF (1076KB)(408)       Save
As one of the main Internet applications, search engine can retrieve and return effective information from Internet resources according to user needs. However, the obtained returned list often contains noisy information such as advertisements and invalid Web pages, which interfere the user's search and query. Aiming at the complex structural features and rich semantic information of Web pages, a Web page blacklist discrimination method based on attention mechanism and ensemble learning was proposed. And, by using this method, an Ensemble learning and Attention mechanism-based Convolutional Neural Network (EACNN) model was built to filter useless Web pages. First, according to different categories of HTML tag data on Web pages, multiple Convolutional Neural Network (CNN) base learners based on attention mechanism were established. Second, an ensemble learning method based on Web page structural features was used to perform different weight computation to the output results of different base learners to realize the construction of EACNN. Finally, the output result of EACNN was used as the analysis result of Web page content to realize the discrimination of Web page blacklist. The proposed method focuses on the semantic information of Web pages through attention mechanism, and introduces the structural features of Web pages through ensemble learning. Experimental results show that, compared with baseline models such as Support Vector Machine (SVM), K-Nearest Neighbor ( KNN), CNN, Long Short-Term Memory (LSTM) network, Gate Recurrent Unit (GRU) and Attention-based CNN (ACNN), EACNN has the highest accuracy (0.97), recall (0.95) and F 1 score (0.96) on the geographic information field-oriented discrimination dataset constructed. It verifies the advantages of EACNN in the task of discriminating Web page blacklist.
Reference | Related Articles | Metrics